We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists

Open access books available 5,300

130,000 155M

International authors and editors

Downloads

Our authors are among the

most cited scientists 154 TOP 1%

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

# Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com

**Vision Goes Symbolic Without Loss of Information Within the Preattentive Vision Phase: The Need to Shift the Learning Paradigm from Machine-Learning (from Examples) to Machine-Teaching (by Rules) at the First Stage of a Two-Stage Hybrid Remote Sensing Image Understanding System, Part II: Novel Developments and Conclusions** 

> Andrea Baraldi *Department of Geography, University of Maryland, College Park, Maryland, USA*

# **1. Introduction**

The goal of this work is to revise, integrate and enrich previous analyses found in related papers about recent developments in the design and implementation of an operational automatic multi-sensor multi-resolution near real-time two-stage hybrid stratified hierarchical remote sensing (RS) image understanding system (RS-IUS) (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi, 2011a).

For publication reasons this work consists of two companion papers, Part I and Part II respectively. In Part I related papers, concepts and definitions are revised from existing literature to provide this work with a significant survey value and make it self-contained. The survey of past works is completed in Part II Section 2, where differences at the architectural level between different families of existing RS-IUSs, namely, multi-agent hybrid RS-IUSs, two-stage segment-based RS-IUSs and two-stage stratified hierarchical hybrid RS-IUSs, are highlighted.

The original contribution of Part II is to propose novel definitions of objective continuous sub-symbolic sensory data, continuous physical information, subjective discrete semisymbolic data structure, discrete semantic-square (semantic2) information (which is naturally generated from the simultaneous combination of three components: (I) an objective continuous sensory data set, (II) an external subjective supervisor (observer) and (III) his/her own subjective prior ontology equivalent to a model of the (3-D) world existing before looking at the objective sensory data at hand) and prior knowledge base.

In practical contexts the aforementioned original definitions imply the following.


Some practical conclusions of potential interest to the RS, computer vision (CV), artificial intelligence (AI) and machine learning (MAL) communities stem from these speculations. Firstly, in operational contexts (e.g., RS image classification problems at national/ continental/ global scale), other than toy problems (e.g., RS image mapping at coarse spatial resolution and local/regional scale), inductive classifiers capable of learning from a finite labeled data set are considered structurally inadequate to correlate (rather than extract, see this text above) discrete semantic2 information with objective sensory data provided, *per se*, with no semantics at all.

Secondly, to increase the operational quality indicators (QIs) of existing two-stage hybrid RS-IUSs (namely, degree of automation, accuracy, efficiency, robustness to changes in input parameters, robustness to changes in the input data set, scalability, timeliness and economy), any first-stage inductive MAL-from-examples approach should be replaced by a deductive Machine Teaching (MAT)-by-rules capable of generating a preliminary classification first stage where small, but genuine image details are well preserved (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi, 2011a).

Thirdly, in RS-IUSs, MAL-from-data algorithms, either labeled (supervised) or unlabeled (unsupervised), either context-insensitive (e.g., pixel-based) or context-sensitive (e.g., 2-D object-based), should be adapted to work on a driven-by-knowledge stratified (semantic masked, layered) basis and moved to the second stage of a novel two-stage stratified hierarchical hybrid RS-IUS architecture recently proposed in RS literature (Baraldi et al., 2006a; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b).

As a proof of these concepts, the operational automatic multi-sensor multi-resolution near real-time Satellite Image Automatic Mapper™ (SIAM™), recently presented in RS literature<sup>1</sup> (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b), is adopted as first stage.

The rest of Part II of this work is organized as follows. Part II Section 3 discusses theoretical inconsistencies and algorithmic drawbacks found in Diamant's works (discussed in Part I Section 2.2 and Part I Section 2.5). Revised/novel definitions of objective continuous sensory data, continuous physical information, discrete semantic2 information and prior knowledge are provided in Part II Section 4. In Part II Section 5 practical consequences of the novel definitions provided in Part II Section 4 are considered for CV, AI and MAL applications. Part II Section 6 presents the operational automatic multi-sensor multi-resolution near realtime SIAM™ as a proof of the original concepts proposed in this work. Conclusions are reported in Part II Section 7.

<sup>1</sup> SIAM™ - Patent pending - © Andrea Baraldi University of Maryland.

# **2. Related works (continued): Taxonomy of hybrid RS-IUS architectures**

As reported in Part I Section 2.1, there is a new trend of research and development in both CV (Cootes & Taylor, 2004) and RS literature (Matsuyama & Shang-Shouq Hwang, 1990; Shunlin Liang, 2004) to outperform existing scientific and commercial image understanding systems. This novel trend focuses on the development of quantitative hybrid models for retrieving sub-symbolic continuous variables (e.g., LAI) and symbolic categorical discrete variables (e.g., land cover composition) from multi-spectral (MS) imagery. By definition, hybrid models combine both statistical and physical models to take advantage of the unique features of each and overcome their shortcomings (see Part I Section 2.1). The study of hybrid quantitative models is also called AI systems integration. In this section, the taxonomy of hybrid RS-IUSs is summarized in line with (Baraldi et al., 2010a). It consists of:


# **2.1 Multi-agent hybrid RS-IUSs**

In existing literature multi-agent hybrid RS-IUSs provide application-specific combinations of inductive and deductive inference mechanisms (Matsuyama & Shang-Shouq Hwang, 1990). A traditional multi-agent hybrid RS-IUS architecture comprises the following modules (see Fig. 1).


2.3). The combination of top-down with bottom-up inference strategies achieves two operational advantages: (a) provides better conditions for an otherwise ill-posed drivenwithout-knowledge segmentation first stage (refer to Part I Section 2.3) and (b) allows restriction of intensive processing to a small portion of the image data (Matsuyama & Shang-Shouq Hwang, 1990), analogously to a focus of visual attention in pre-attentive biological vision (Mason & Kandel, 1991; Gouras, 1991; Kandel, 1991). The high-level processing second stage comprises (Matsuyama & Shang-Shouq Hwang, 1990): (I) a Spatial Reasoning Expert (SRE) whose aim is to trigger the instantiation, within a candidate local area, of plausible generic (3-D) object models found in the available world model, e.g., house, and (II) a SOMSE (refer to this text above) which uses domaindependent knowledge about specific applications to: (i) prune the search space of specialized (3-D) object models (e.g., rectangular house, L-shaped house, etc.) linked by A-KIND-OF relations to the generic target (3-D) object model (e.g., house) provided by SRE; (ii) transform the 3-D appearance properties of the specialized (3-D) object model into a selected set of 2-D appearance properties based on the imaging sensor model; (iii) transform a target spatial relation in fuzzy terms (e.g., in front of) provided by SRE into a local area based on a trial-and-error heuristic search with no concrete theoretical basis and (iv) provide a consistency examination between quantitative absolute image features collected by LLVE in a local area and the target 2-D appearance constraints. In other words, the 2-D appearance properties must be satisfied by image features extracted by LLVE from a local area. Since the image structure in a local area is very simple compared with that of the entire image, image feature extraction performed by an object modeldriven and locational constrained LLVE can be very efficient and reliable compared with that performed by the same LLVE run image-wide at the first stage (Matsuyama & Shang-Shouq Hwang, 1990) (p. 41).

Fig. 1. Multi-agent hybrid systems for RS image understanding (derived from Figure 2.1 in (Matsuyama & Shang-Shouq Hwang, 1990), p. 36).


Table 1. SIAM™ system of systems. List of spaceborne optical imaging sensors eligible for use as input.

Multi-agent hybrid systems typically suffer from two main limitations.


To overcome these limitations, an alternative two-stage stratified hierarchical hybrid RS-IUS architecture, such as that shown in Fig. 3, was proposed in recent literature (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi, 2011a; Baraldi, 2011b; Baraldi et al., 2010c).

#### **2.2 Two-stage segment-based RS-IUSs**

Two-stage segment-based RS-IUSs comprise an inductive driven-without-knowledge image segmentation first stage and a second-stage object-based classifier, see Fig. 2. The latter can be implemented based on deductive or inductive inference mechanisms, say, as a prior knowledge-based non-adaptive decision-tree or a supervised data learning classifier (e.g., a Support Vector Machine, SVM (Bruzzone & Carlin, 2006)).

Due to the availability of a commercial GEOBIA software developed by a German company (Definiens Imaging GmbH, 2004; Esch et al., 2008), two-stage segment-based RS-IUSs have recently gained widespread popularity and are currently considered the state-of-the-art in both scientific and commercial RS image mapping application domains (Mather, 1994; Pekkarinen, Reithmaier & Strobl, 2009). In practice, under the guise of 'flexibility' current commercial 2-D object-based software provides overly complicated options to choose from (Hay & Castilla, 2006). This means that with their increasing diffusion commercial two-stage segment-based RS-IUSs show an increasing lack of productivity (Tapsall et al., 2010), consensus and research (Castilla et al., 2008; Hay & Castilla, 2006) (refer to Part I Section 2.4.1.2).

#### **2.3 Two-stage stratified hierarchical hybrid RS-IUS employing SIAM™ as its preliminary classification first stage**

Accounting for the customary distinction between a model and the algorithm used to identify it (Baraldi et al., 2010a; Baraldi, 2011a), an original two-stage stratified hierarchical hybrid RS-IUS architecture (see Fig. 3) was identified starting from several RS-IUS

Fig. 2. Two-stage segment-based hybrid RS-IUS architecture adopted, for example, by the eCognition commercial software toolbox (Definiens Imaging GmbH, 2004). Preliminary image simplification is pursued by means of an (ill-posed hierarchical) image segmentation approach which generates as output a segmented (discrete) map, either single-scale or multi-scale. Worthy of note is that first-stage output sub-symbolic informational primitives, namely, labeled segments (2-D objects, parcels), e.g., segment 1, segment 2, etc., are provided with no semantic meaning.

implementations proposed by Shackelford and Davis in recent years (Shackelford & Davis, 2003a; Shackelford & Davis, 2003b). This novel RS-IUS architecture comprises the following phases (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b).

a. A radiometric calibration pre-processing stage, where DNs are transformed into top-ofatmosphere reflectance (TOARF) or surface reflectance (SURF) values, with TOARF SURF, the latter being an ideal (atmospheric noise-free) case of the former. This radiometric calibration constraint not only ensures the harmonization and interoperability of multi-source observational data in line with the Quality Assurance Framework for EO (QA4EO) guidelines (GEO/CEOSS, 2008), but is considered a necessary, although not sufficient, condition for input Earth observation (EO) imagery to be automatically interpreted (see Part I Section 2.7.1). It is worth mentioning that a RS-IUS suitable for mapping TOARF values into surface categories makes the inherently ill-posed (therefore, difficult to solve) atmospheric correction problem an optional MS image pre-processing stage unlike competing classification approaches employing surface reflectance spectra, such as the ERDAS ATCOR3 (Richter, 2006) (see Part I Section 2.7.1).

Fig. 3. Novel hybrid two-stage stratified hierarchical RS-IUS architecture. This data flow diagram (DFD) shows processing blocks as rectangles and sensor derived data products as circles. In this example, a SPOT-5 MS image is adopted as input. The panchromatic (PAN) image can be generated from the MS image. The MS image is input to the preliminary classification first stage and, if useful, to second-stage class-specific classification modules. The PAN image is exclusively employed as input to second-stage stratified class-specific context-sensitive classification modules, where color information is dealt with by stratification. For example, stratified texture detection is computed in the PAN image domain, which reduces computation time.


In (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b), the abovementioned first-stage pixel-based preliminary classifier was designed and implemented as an original operational automatic near-real-time per-pixel multi-source multi-resolution application-independent SIAM™. To employ as input a radiometrically calibrated MS image acquired by almost any of the ongoing or future planned satellite optical missions, SIAM™ is designed as an integrated system of systems. It comprises a "master" 7-band Landsat-like SIAM™ (L-SIAM™) together with five downscaled ("slave", derived) versions of L-SIAM™ whose input is a MS image featuring a spectral resolution that overlaps with, but is inferior to, Landsat's. To summarize, SIAM™ combines six sub-systems (refer to Table 1).


Table 2. Preliminary classification map legend adopted by L-SIAM™ at fine semantic granularity. Pseudo-colors of the 95 spectral categories are gathered based on their spectral end member (e.g., bare soil or built-up) or parent spectral category (e.g., "high" LAI vegetation types). The pseudo-color of a spectral category is chosen as to mimic natural colors of pixels belonging to that spectral category.

Table 3. Preliminary classification map legend adopted by I-SIAM™ at fine semantic granularity. Pseudo-colors of the 52 spectral categories are gathered based on their spectral end member (e.g., bare soil or built-up) or parent spectral category (e.g., "high" LAI vegetation types). The pseudo-color of a spectral category is chosen as to mimic natural colors of pixels belonging to that spectral category.

Fig. 4 to Fig. 6 show qualitatively that, in disagreement with a common opinion in the RS community where GEOBIA is considered indispensable for spaceborne VHR image understanding (Bruzzone & Carlin, 2006; Bruzzone & Persello, 2009; Persello & Bruzzone, 2010), the pixel-based SIAM™ is very successful in the automatic mapping of RS imagery, including VHR images (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b). This means that SIAM™ is not affected by the well-known salt-and-pepper classification noise effect which traditionally affects ordinary pixel-based classifiers (e.g., maximum-likelihood classifiers (Cherkassky and Mulier, 2006)), which is tantamount to saying that SIAM™ is successful in modeling the within-spectralcategory variance.

Fig. 4(a). Web-Enabled Landsat Data (WELD) Project (USGS & NASA, 2011). This is a joint NASA and USGS project providing seamless consistent mosaics of fused Landsat-7 Enhanced TM Plus (ETM+) and MODIS data radiometrically calibrated into top-ofatmosphere reflectance (TOARF) and surface reflectance. These mosaics are made freely available to the user community. Each consists of 663 fixed location tiles. Spatial resolution: 30 m. Area coverage: Continental USA and Alaska. Period coverage: 7-year. Product time coverage: weekly, monthly, seasonal and annual composites.

Fig. 4(b). Including the map of Alaska at the top right. Preliminary classification map automatically generated by L-SIAM™ from the 2008 annual WELD mosaic shown in Fig. 4(a). Output spectral categories are depicted in pseudo colors. Map legend: refer to Table 2. To generate this map at national scale L-SIAM™ was run overnight by L. Boschetti (Univ. of Maryland) in Dec. 2010. To the best of this author's knowledge, this is the first example of such a high-level product automatically generated at both the NASA and USGS.

#### 110 Earth Observation

Fig. 5(a). 4-band GMES-IMAGE2006 Coverage 1 mosaic, consisting of approximately two thousand 4-band IRS-P6 LISS-III, SPOT-4, and SPOT-5 images, mostly acquired during the year 2006, depicted in false colors: Red – Band 4 (Short Wave InfraRed, SWIR), Green – Band 3 (Near IR, NIR), Blue – Band 1 (Visible Green). Down-scaled spatial resolution: 25 m.

Fig. 5(b). Preliminary classification map automatically generated by S-SIAM™ from the mosaic shown in Fig. 5(a). Output spectral categories are depicted in pseudo colors. A map legend similar to Table **2** is adopted: water and shadow areas are in blue, clouds in white, snow and ice in light blue, vegetation types in different shades of green, rangeland types in different shades of light green, barren land types in different shades of brown and grey. To the best of this author's knowledge, this is the first example of such a high-level product automatically generated at the European Commission – Joint Research Center (EC-JRC).

Fig. 6(a). QuickBird-2 image, 2.4 m spatial resolution, acquisition date 2010-03-16, radiometrically calibrated into TOARF values, depicted in false colors (R: 3, G: 4, B: 1). Default image histogram stretching: ENVI linear stretching 2%.

Fig. 6(b). Automatic Q-SIAM™ preliminary mapping of the QB-2 image shown in Fig. 6(a). Spectral categories are depicted in pseudo colors. Map legend: see Table 3. It is noteworthy that, within the Q-SIAM™ mutually exclusive and completely exhaustive classification scheme, cloud detection is *per se* an interesting operational product with relevant commercial applications and, to the best of these authors' knowledge, without alternative solutions in either commercial or scientific RS-IUSs.

Fig. 7(a). Zoomed area of a Landsat 7 ETM+ image of Virginia, USA (path: 16, row: 34, acquisition date: 2002-09-13), depicted in false colors (R: band ETM5, G: band ETM4, B: band ETM1), 30 m resolution, calibrated into TOARF values.

Fig. 7(b). 2nd-stage stratified vegetated land cover classification map generated in series with the L-SIAM™ first stage from Fig. 7(a). This 2nd-stage map consists of 19 vegetated/non-vegetated land cover classes, depicted in pseudo-colors, including: crop field or grassland, broad-leaf forest, needle-leaf forest and non-vegetated pixels (in black). Input features are: spectral layers generated by L-SIAM™, (achromatic) brightness and multi-scale isotropic texture features extracted from the brightness image.

To the best of this author's knowledge no unifying automatic multi-sensor multi-resolution near real-time RS image classification platform alternative to SIAM™ can be found in existing literature. This is tantamount to saying that SIAM™ provides the first operational example of an automatic multi-sensor multi-resolution near real-time EO system of systems envisaged under on-going international research programs such as the Global EO System of Systems (GEOSS) conceived by the Group on Earth Observations (GEO) (GEO, 2005; GEO, 2008a) and the Global Monitoring for the Environment and Security (GMES), which is an initiative led by the European Union (EU) in partnership with the European Space Agency (ESA) (ESA, 2008; GMES, 2011) (see Part I Section 1).

Fig. 7 shows an example of an automatic 2nd-stage stratified rule-based vegetated land cover classification system in series with the L-SIAM™ first stage. The two-stage automatic classifier employing L-SIAM™ as preliminary classification first stage (refer to Fig. 3) is input with a 7-band Landsat image radiometrically calibrated into TOARF values, shown in Fig. 7(a). The 2nd-stage stratified rule-based vegetated land cover classification system in series with the L-SIAM™ first stage employs as input features: spectral-based layers (strata, generated by L-SIAM™ at first stage), (achromatic) brightness and multi-scale isotropic texture extracted from the brightness image. The 2nd-stage classifier provides as output a classification map consisting of 19 vegetated/non-vegetated land cover classes, depicted in pseudo-colors, including: crop field or grassland, broad-leaf forest, needle-leaf forest and non-vegetated pixels (in black), see Fig. 7(b).

# **3. Inconsistencies and limitations of the Diamant computational theory and algorithms**

An original analysis of the Diamant definitions reported in Part I Section 2.2.3 and Diamant's image segmentation and contour detection algorithms summarized in Part I Section 2.5 is provided below.

# **3.1 Comments on the Diamant definitions of data, information and knowledge**

According to this author, the Diamant definitions reported in Part I Section 2.2.3 are affected by three major drawbacks.

i. Diamant states that "information elicitation (extraction) does not require incorporation of any high-level knowledge" (Diamant, 2010a; Diamant, 2010b), which is tantamount to saying that detection of non-semantic primary data structures (data objects), e.g., (2-D) image segments, in an unlabeled data set, e.g., a (2-D) image, does not require incorporation of any high-level (prior) knowledge. Based on this statement it is possible to conclude that despite his theoretical anti-conformism, namely, his willingness to replace the MAL-from-examples paradigm with the MAT-by-rules approach, Diamant is a conformist in practice. In fact, the Diamant image contour detection and image segmentation algorithms (see Part I Section 2.5) fit existing CV system architectures well established in literature, such as, respectively, the Marr CV system architecture, conceived in the 1980s and comprising a zero-crossings (contour detection) primal sketch, and RS-IUSs where an image segmentation first stage is adopted in agreement with the GEOBIA approach (see Part I Section 2.4.1.2). In other words, there is a clear contradiction in terms between the Diamant claim of replacing the MAL-from-examples with a MAT-by-rules paradigm and his practical proofs of concept, consisting of image segmentation and contour detection algorithms 100% consistent with the same MALfrom-examples paradigm he intends to overcome.


To conclude, Diamant appears to have totally misunderstood one of two facts about the MAL-from-examples paradigm. These two facts hold true for MAL from unlabeled data and MAL from labeled data algorithms, respectively, as described below.

a. MAL from unlabeled (unsupervised) data (see Part I Section 2.1 and Part I Section 2.4.1). Any machine learning from unlabeled data approach (e.g., unlabeled data clustering, image segmentation) is inherently ill-posed and requires prior knowledge to become better posed. It means that any attempt to extract non-semantic primary data structures (data objects), e.g., image segments and unlabeled data clusters, from an unlabeled data set (e.g., an image) without incorporation of high-level knowledge provided by an

external supervisor is a fatal misconception, committed by Diamant himself, stemming from the fallacies (inherent ill-posedness) of the MAL-from-examples paradigm.

b. MAL from labeled (supervised) data (see Part I Section 2.1 and Part I Section 2.4.2). It is true that, in Diamant's words, "knowledge about the rules that underpin (semantic) secondary (data) structures formation (from primary data structures considered as nonsemantic and driven-without-knowledge) is a property of human observers (or their artificial counterparts) and not an inherent property of the data... (therefore) attempts to extract semantics from data are a fatal misconception stemming from the fallacies of the data-processing paradigm..." (Diamant, 2010a). This quote implies that no semantic information can be extracted from objective sensory data, but a correlation function can be established between semantic concepts and objective data for toy data understanding problems exclusively (refer to Part I Section 1 and Part I Section 2.1).

### **3.2 Comments on the Diamant image segmentation algorithm**

In practical terms, the image segmentation algorithm proposed by Diamant can be subjected to the following criticisms.

	- Degree of automation. The following questions remain unanswered. What is the number of the image segmentation-free parameters to be user-defined? Have these user-defined parameters a physical meaning? What is their range of change?
	- Robustness to changes in input parameters to be user-defined.
	- Robustness to changes in the input data set acquired across time, space and sensors. In his paper (Diamant, 2005) Diamant applies his image segmentation algorithm to a single toy problem whose input data set consists of a panchromatic image 640×480 pixels in size. What about color images? What about satellite imagery? What about synthetic images of known visual properties?
	- Scalability. For example, does this image segmentation algorithm apply to data sets of different spatial scales, e.g., mosaics of hundreds of satellite images to generate classification maps at global scale where small but genuine image details (e.g., one pixel-wide roads) must be well preserved? I am afraid it does not... Does it apply to different sensors and users?
	- Efficiency in computation time and memory occupation.
	- Accuracy in terms of spatial quality of the segment boundaries (Baraldi et al., 2005; Persello & Bruzzone, 2010).

The conclusion is that based on existing literature the overall quality of the Diamant image segmentation algorithm remains unknown, which is often the case with the dozens of alternative image segmentation algorithms published in RS and CV literature each year (refer to Part I Section 2.4.1.2). Perhaps it is also due to these implementation shortcomings that so many researchers and practitioners ignored or criticized Diamant's methodological speculations.

	- In (Diamant, 2005) Diamant writes "segmentation/classification" and then "spatially connected regional groups (of pixels)" as "clusters" rather than segments, blobs or regions (see Part I Section 2.3). It is well known that (2-D) image segmentation, labeled (supervised) data classification and unlabeled (unsupervised) data clustering are completely different inductive learning-from-data problems (see Part I Section 2.4). Mixing these terms is a relevant conceptual mistake.
	- It is well known that image region extraction is the dual task of edge detection, in fact they are both inherently ill-posed inductive learning-from-unlabeled data problems (see Part I Section 2.4.1.2). In (Diamant, 2005), quite surprisingly Diamant acknowledges the ill-posedness of edge detection, but appears to ignore the inherent ill-posedness (subjective nature) of image region extraction acknowledged by a relevant portion of existing literature (see Part I Section 2.4.1.2). In fact, he states: "the efficiency of (my own) unsupervised top-down directed region-based (learning from unlabeled data) image segmentation is hard to disprove today" (Diamant, 2005). For example, by replacing pixels belonging to the same segment with their segment-based mean value (often called mean image), Diamant's image segmentation algorithm provides as output a piecewise constant approximation of the input image. Of course, researchers and practitioners interested in texture segmentation would find the Diamant piecewise constant image segmentation of little utility. In fact, the Diamant image segmentation algorithm incorporates no texture model. In practice, it detects texture elements (textons) rather than textures (made of textons) in the image. This accounts for the subjective nature of the image segmentation problem which is apparently ignored by Diamant.

To summarize, the Diamant image segmentation algorithm appears as "yet another image segmentation algorithm" (Baraldi et al., 2010a) based on heuristics whose superiority against alternative approaches is completely unproved. In other words, the image segmentation algorithm proposed by Diamant cannot be considered as adequate proof of his concepts (see Part I Section 2.2.3.2).

#### **3.3 Comments on the Diamant contour detector**

In practical terms, the contour detection algorithm proposed by Diamant can be subjected to the following criticisms.

	- Correlation between Iint = Eq. (1-3) and status = Eq. (1-4) can be relevant, i.e., Iloc = Eq. (1-2) = Eq. (1-3) × Eq. (1-5) is the product of two correlated contrast values where one-of-two is absolute valued.
	- Term Iint = Eq. (1-3) is not consistent with the psychophysical phenomenon of the Mach bands: where a luminance (radiance, intensity) ramp meets a plateau, there are spikes of brightness (perceived luminance), whereas there are none in the luminance profile. This is the sole case of continuity in the luminance profile capable of generating spikes of brightness (Baraldi & Parmiggiani, 1996a).

To summarize, the Diamant contour detector appears to be neither new nor biologically plausible. It can be considered as "yet another contour detector" (Baraldi et al., 2010a) based on heuristics whose superiority against alternative approaches is completely unproved. In other words, the contour detector proposed by Diamant cannot be considered as adequate proof of his concepts (see Part I Section 2.2.3.2).

#### **4. Revised/novel definitions of objective continuous sub-symbolic sensory data, continuous physical information, subjective discrete semi-symbolic data structure, discrete semantic-square (semantic 2 ) information and prior knowledge base**

As a revision of Diamant's works (Diamant, 2005; Diamant, 2008; Diamant, 2010a; Diamant, 2010b), a new set of definitions of: (i) sub-symbolic objective primary data element in an objective sensory data set, (ii) semi-symbolic subjective secondary data structure, (iii) objective physical information, (iv) subjective semantic-square (semantic2) information and (v) subjective prior knowledge base (ontology or model of the 3-D world) provided by an external subjective supervisor (human, God or equivalent machine).

# **4.1 Levels of aggregation of objective continuous sub-symbolic sensory data**

There are five fine-to-coarse possible levels of aggregation of objective continuous subsymbolic sensory data. These levels of aggregation are either sub-symbolic (non-semantic), semi-symbolic or symbolic. Semi-concepts are defined as stable concepts (percepts, classes of 3-D objects in the world) whose semantic meaning is adopted at the bottom level (layer 0) of an ontology (see Part I Section 2.2.2). The semantic information of semi-concepts (e.g., in a RS image, land cover semi-concepts are spectral categories such as *water or shadow*, *snow or ice*, *bare soil or built-up*, *vegetation,* etc.) is superior to that of objective data, whose semantic information is null, but equal or inferior (i.e., not superior) to that of concepts belonging to higher levels of abstraction (aggregation) in the ontology at hand (e.g., in a RS image classification taxonomy such as the International Global Biosphere Programme (IGBP) land cover classification scheme (FAO, 2000), target (3-D) land cover classes are *water bodies*, *snow or ice*, *barren*, *urban and built-up*, *needle-leaf forest, broad-leaf forest, mixed forest, shrubland, grassland, cropland,* etc.) (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b). An ontology is a hierarchical abstract representation (model) of the (3-D) world. For example, well-known examples of RS data classification taxonomies are the aforementioned IGBP land cover classification scheme (FAO, 2000), the Co-ordination of Information on the Environment (CORINE) (European Commission Joint Research Center, 2005), the U.S. Geological Survey (USGS) classification hierarchy (Lillesand & Kiefer, 1994) and the Food and Agriculture Organization of the United Nations (FAO) Land Cover Classification System (LCCS) (Di Gregorio & Jansen, 2000; Herold et al., 2006). An ontology can be modeled as a semantic network consisting of a hierarchical class taxonomy, represented as an inverted tree whose leaves are at the bottom layer 0, plus relationships between classes as arcs between nodes (refer to Part I Section 2.2.2).

The five fine-to-coarse possible levels of aggregation of objective sub-symbolic sensory data are listed below.


as one semi-symbolic secondary data structure. Each label belongs to a discrete and finite set of semi-concepts. The semantic meaning of semi-concepts (e.g., *vegetation*) is superior to zero (like that of unlabeled primary data elements) and not superior (i.e., equal or inferior) to that of concepts in the real (3-D) world. A discrete and finite quantitative data set consisting of *p* unlabeled objective primary data elements (e.g., a multi-spectral image consisting of *p* pixels, refer to point 3. above) always consists of a discrete and finite set of semi-symbolic secondary data structures whose cardinality is identified hereafter as *s*, such that inequality (*s p*) always holds. It is noteworthy that if equality (*s* == *p*) holds, this does not correspond to a trivial case since secondary data structures are semi-symbolic while primary data elements are sub-symbolic. To the best of this author's knowledge, it is at the level of subjective semi-symbolic secondary data structures that the view of the present author starts diverging from *all* existing CV algorithms and implementations, including GEOBIA-based RS-IUSs and Diamant's image segmentation and contour detection algorithms. This degree of novelty is consistent with well-known evidence collected in CV and MAL domains. For example:


In practice, the following definition holds.

Discrete semi-symbolic secondary data structure = Continuous sub-symbolic primary data element(s) + discrete semi-symbolic label belonging to a discrete and finite set of semi-concepts (e.g., in RS image understanding, possible semi-concepts are spectral categories equivalent to land cover class sets consisting of one or more land cover classes; examples of spectral categories are *vegetation*, *water or shadow*, *bare soil or built-* *up*, etc. (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi, 2011a; Baraldi, 2011b; Baraldi et al., 2010c)).

This also means that the set of discrete semi-symbolic secondary data structures incorporates the continuous objective sensory data set.

	- in line with the CV system proposed by Marr at the level of computational theory (see Part I Section 2.6) when he states: "vision goes symbolic almost immediately, right at the level of zero-crossings (primal sketch)… without loss of information" (Marr, 1982) (p. 343) (refer to Part I Section 2.3)
	- In contrast with the CV system proposed by Marr at the level of algorithm design and implementation (see Part I Section 2.5), where the term primal sketch identifies the non-symbolic output of a zero-crossings algorithm, which is an instance of the unlabeled data learning class of image edge detectors/region extractors (Marr, 1982).

It is noteworthy that in a (2-D) preliminary classification map domain, a labeled semisymbolic segment may be defined as a spatially connected set of secondary semisymbolic data structures featuring the same label, say, connected pixels featuring label *vegetation*. Therefore, in a (2-D) preliminary classification map domain, semi-symbolic pixels belong to semi-symbolic image segments which belong to semi-symbolic image strata (layers) defined as image-wide sets of semi-symbolic segments featuring the same semi-symbolic label. In other words, in the preliminary classification map domain, three spatial types co-exist: **semi-symbolic pixels in semi-symbolic image segments in semisymbolic image strata**. This would end the bad-faith antagonism between unlabeled pixels versus labeled non-symbolic segments (e.g., segment 1, segment 2, etc.) which affects traditional pixel-based versus object-based RS-IUSs and CV systems (refer to Part I Table 1). A labeled subjective semi-symbolic quantitative data set can be described (encoded) according to a given pair of one mathematical and one natural vocabulary/language capable of accounting for both the quantitative and semantic (qualitative, subjective) nature of labeled subjective semi-symbolic secondary data structures (refer to point 4. above).

#### **4.2 Continuous physical information**

**Continuous physical (quantitative, objective, sensory) information. This is a hierarchical (i.e., multi-scale, including one-scale as a special case) description (representation), namely, down-scale encoding (decomposition), up-scale decoding (reconstruction) or onescale transcoding (from one data format to another at the same hierarchical level), of the physical objective data set based on a given mathematical non-natural vocabulary/language.** This hierarchical description/ representation of the objective sensory data set can be either lossless or lossy, depending on the exact/non-exact reconstruction (decoding) of the original data set from its representation (encoding). For example, an FFT of a time-signal is a one-scale transcodification of the signal from the time to the frequency domain. A well-known example of down-scale encoding/up-scale decoding is the Gaussian-Laplacian image pyramid (Burt & Adelson, 1983). It means that physical

information stems from the combination of an objective data set with a mathematical nonnatural vocabulary/language. To summarize the concept of physical information, we can write the following definition.

Continuous objective data set + (arbitrary) multi-scale down-scale encoding, up-scale decoding or one-scale transcoding/description/data format = hierarchical physical information encompassing down-scale/ fine-to-coarse resolution/ compression/ encoding, up-scale/ coarse-to-fine resolution/ decompression/ decoding, and/or one-scale transcodification (from one data format to another at the same hierarchical level), either lossless or lossy.

#### **4.3 Discrete semantic-square information**

**Discrete semantic-square (semantic2)** (where semantic is a synonym of categorical, symbolic, subjective, abstract, qualitative, vague, but persistent, stable, see Part I Section 2.1) **information (concepts, percepts) stems from the semantic2 labeling of an objective data set performed by an external subjective supervisor** (human, God or equivalent machine) **provided with a subjective hierarchical prior knowledge base** (ontology or model of the (3-D) world, equivalent to an inverted tree with leaves at the bottom level 0, see Part I Section 2.2.2). **Semantic2 labeling occurs when a subjective supervisor (first source of subjectivity), provided with his/her own subjective ontology (second source of subjectivity), observes and scrutinizes the objective data set, consisting of** *p* **sub-symbolic primary data elements (refer to point 3. in Section 4.1), to achieve the following.** 


This definition of semantic2 labeling disagrees at the level of the aforementioned point a. with the traditional definition of semantic labeling provided by MAL, which encompasses existing CV systems (e.g., Diamant's (Diamant, 2005)) and RS-IUSs (e.g., (Definiens Imaging GmbH, 2004; Matsuyama & Shang-Shouq Hwang, 1990)). In fact, point a. above states that **semantic2 information stems naturally (automatically, instantaneously) from the simultaneous interaction of three necessary and sufficient components.** 



The aforementioned points i.-iii. imply that **objective sensory data**, *per se*, **do not possess any semantic2 information, but physical information exclusively.** Rather, **semantic<sup>2</sup> information incorporates objective data as one-of-three components**. This also means that nobody should disagree with Diamant when he repeats over and over that sensory data do not possess semantic information, therefore semantic information cannot be *extracted* from sensory data (Diamant, 2010a). On the contrary, Diamant's statement should not be considered original at all because it has been perfectly acknowledged in philosophy for hundreds of years, as well as in psychophysical studies of perception (Matsuyama & Shang-Shouq Hwang, 1990) and MAL in the last 50 years (Cherkassky & Mulier, 2006). This concept is summarized below.


The foregoing comments also mean that Diamant is right, although vague, when he states that "semantics is a property of a human observer" (Diamant, 2010a). **To state this more precisely, since semantic2 information naturally (automatically, instantaneously) stems from the interaction of three necessary and sufficient components i.-iii. (see above in this text), then semantic2 information cannot be separated from any of its three components.**  For example, let us think of a piano (symbolic data structure) whose objective presence (fact) requires the simultaneous presence of a subjective human actor (or equivalent machine) to

generate whatever sound (semantic information). The sound (generated semantic information) is neither in the piano, nor in the piano player, nor in his/her prior knowledge of what a piano is all about, but in the instantaneous combination of these three factors. This also means that **semantic2 information** quite obviously **changes with the objective data set, the subjective human supervisor and his/her own subjective ontology.** In particular (refer to this text above), **semantic2 information means there are two subjective actors in the semantic labeling of objective sensory data, namely, the subjective external observer and scrutinizers (or equivalent machine) and his/her own ontology or semantic (abstract) model of the world**. In fact, it is well known that all humans do not adopt the same ontology and two humans who adopt the same ontology do not apply this ontology the same way through time in interpreting a given observation. For example, two players will never generate the same music when playing the same musical score on the same piano. Not even the same player will ever generate the same music when playing twice the same musical score on the same piano. To summarize these concepts we can write the following definition.

Objective sensory data set + subjective supervisor provided, as such, with a subjective prior hierarchical knowledge base (ontology) = hierarchical semantic2 (subjective2) information, which includes physical information at the bottom level 0 of the inverted tree which deals with the semantic granularity of semi-concepts assigned to semi-symbolic secondary data structures.

### **4.4 Subjective hierarchical (multi-scale) prior knowledge base**

**Subjective hierarchical (multi-scale) prior knowledge base (ontology, model of the (3-D) world) equivalent to a semantic net or inverted tree with leaves at the bottom level 0**  where physical information is incorporated. Refer to this text above.

# **4.5 Intelligence**

**Intelligence** (cognition) is the system's ability to aggregate bottom-up (from-data-to-concepts) and disassemble top-down (from-concepts-to-data) semantic information (which incorporates physical information) across the hierarchical levels of a subjective prior knowledge base.

# **4.6 Information processing system**

**An information processing system, cognitive system or intelligent system** transforms an input sensory data set into an output instantiation of a story in natural language whose hierarchical structure is provided by an ontology or inverted tree retained in the system's memory before looking at the sensory data.

To summarize, the aforementioned novel definitions sketch a RS-IUS where information goes symbolic during the pre-attentive vision phase to generate a semi-symbolic primal sketch (preliminary classification map). This is in line with the CV system proposed by Marr at the level of computational theory (see Part I Section 2.6) when he states: "vision goes symbolic almost immediately, right at the level of zero-crossings (primal sketch)" (Marr, 1982), p. 343 (see Part I Section 2.3). However, it differs from the CV system proposed by Marr at the level of primal sketch implementation (see Part I Section 2.6) consisting of a subsymbolic zero-crossing algorithm (Marr, 1982). In addition, the novel RS-IUS sketched above differs at the level of both computational theory and algorithm design and implementation from existing CV systems such as GEOBIA systems (Definiens Imaging GmbH, 2004; Esch et al., 2008), including Diamant's (Diamant, 2005; Diamant, 2008; Diamant, 2010a; Diamant, 2010b), where an unlabeled data learning (driven-without-knowledge) algorithm is adopted at the first stage.

# **5. Practical consequences of the proposed definitions on CV, AI and MAL system design and implementation strategies**

Practical consequences of the definitions proposed in Part II Section 4 on CV, AI and MAL system design and implementation strategies are several, more detailed, better posed and, therefore, far more relevant than Diamant's (Diamant, 2010a). Thus, they should benefit from more favorable consideration by the scientific community.

	- semantic in nature (see Part I Section 2.3), therefore it is called *preliminary classification map*;
	- capable of preserving small, but genuine image details (high spatial frequency image components). This requirement is inconsistent with existing image segmentation algorithms which are inherently affected by the *uncertainty principle*  according to which, for any contextual (neighborhood) property, we cannot simultaneously measure that property while obtaining accurate localization (Corcoran & Winstanley, 2007; Petrou & Sevilla, 2006) (see Part I Section 2.4.1.2).

Although he stated that vision goes symbolic right at the output of the preattentive vision phase, which has to affect the architectural level of understanding of a CV system (see Part I Section 2.6), Marr selected a sub-symbolic edge detection (zero-crossing) algorithmic for primal sketch generation (Marr, 1982). By embracing the Marr computational theory rather than his algorithmic solutions, the present author concludes that, as output, **the preattentive visual phase no longer generates subsymbolic image primitives, namely, non-semantic points and edges or, vice versa, image regions (which is what was implemented by Marr (Marr, 1982)), but semisymbolic secondary data structures, namely, semi-symbolic pixels in semi-symbolic segments in semi-symbolic strata** (see Part II Section 4) (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b).


**continuous sensory data.** This conclusion is by no means novel as it is well known in literature. For example, Shunlin Liang summarizes this concept in a few words: statistical pattern recognition systems are based on correlation relationships between objective sensory (e.g., RS) data and either continuous (e.g., LAI) or categorical (e.g., land surface) variables (see Part I Section 2.1) (Shunlin Liang, 2004). **Unfortunately, low or no correlation can be found between continuous sensory data and a finite and discrete set of categorical variables, corresponding to independent random variables generating "distinguishable" data structures (data aggregations, data clusters) in realworld data mapping problems at large data scale or fine semantic granularity, other than toy problems at small data scale and coarse semantic granularity.** This low correlation effect is due to the combination of two factors.


Does this mean the relevant effort spent by the MAL community to develop drivenwithout-knowledge image segmentation algorithms (Castilla et al., 2008) or, say, selforganizing topology-preserving unlabeled data clustering algorithms (Fritzke, 1997; Martinetz & Schulten, 1994), has been worthless? Fortunately, not. It rather means the following.

	- I It should be replaced by a deductive MAT-by-rules approach where community-agreed prior knowledge is conveyed to generate as output a lossless semi-symbolic product (consisting of semi-concepts). For example, in a RS-IUS, the MAT-by-rules first stage should generate a preliminary classification map (see Part II Section 4) where small, but genuine image details are well preserved (refer to this text above).
	- II If useful, it should be:
		- a. adapted to work on a driven-by-knowledge stratified (semantic masked) basis and
		- c. next, moved to the second stage of a two-stage stratified hierarchical hybrid cognitive system. For example, a two-stage stratified hierarchical hybrid RS-IUS architecture has been proposed in recent literature, see Fig. 3 (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b).
	- i. The main application domain of supervised data learning algorithms should be considered function regression where input and output variables are continuous non-semantic, see Fig. 1.
	- ii. When a supervised data learning classifier (see Part I Section 2.4.2) is adopted as the first stage of a two-stage hybrid cognitive system, CV system or RS-IUS, it should be considered highly inappropriate. An experimental proof of this concept is that supervised MAL algorithms (say, SVMs), either context-insensitive (e.g.,

pixel-based) or context-sensitive (Bruzzone & Carlin, 2006; Bruzzone & Persello, 2009; Persello & Bruzzone, 2010), considered successful in terms of operational QIs (refer to Part I Section 2.7.2) at local/regional scale, become impracticable in mapping RS image mosaics consisting of hundreds of images at national/continental/global scale (Chengquan Huang et al., 2008). In these real world problems the cost, timeliness, quality and availability of adequate reference (training) data sets derived from field sites, existing maps and tabular data are currently considered the most limiting factors on RS data product generation and validation (Gutman et al., 2004). In particular, the first-stage supervised data learning classifier of a two-stage hybrid RS-IUS should be:

	- a. adapted to work on a driven-by-knowledge stratified (semantic masked) basis and
	- d. next, moved to the second stage of a two-stage stratified hierarchical hybrid RS-IUS architecture proposed in recent literature, see Fig. 3 (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi, 2011a; Baraldi, 2011b; Baraldi et al., 2010c).

# **6. SIAM™ as a proof of the efficacy of the required shift of learning paradigm from MAL-from-examples to MAT-by-rules at the first stage of two-stage hybrid RS-IUSs**

To the best of this author's knowledge SIAM™ provides the first experimental proof of the efficacy of the required switch of learning paradigm from MAL-from-examples to MAT-byrules at the first stage of a two-stage hybrid RS-IUS architecture (refer to Part II Section 2.3), see Table 4. SIAM™ is an operational (good-to-go, press-and-go, turnkey) software button (executable). In particular, SIAM™ is automatic, efficient, scalable, accurate and robust to changes in the input data acquired across time, space and sensors. For example, the automatic SIAM™ is consistent and accurate across sensors at the national/ continental/ global scale (refer to Part II Section 2.3) (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b), whereas semi-automatic inductive data learning neural network approaches, such as SVMs, require to be re-trained (supervised) image-wide (Chengquan Huang et al., 2008).

SIAM™ belongs to the family of physical models that follow the physical laws of the real (3-D) world to represent an abstract of the reality (see Part I Section 2.1) (Shunlin Liang, 2004). In particular, SIAM™ follows the physical laws of spaceborne optical imaging devices to provide a two-stage hybrid RS-IUS with a first-stage deductive prior knowledge-based inference mechanism. Unfortunately, it takes a long time for human experts to learn physical laws of the real (3-D) world and tune physical models based on human intuition, domain expertise and evidence from data observations (Mather, 1994; Shunlin Liang, 2004). For example, the development of the SIAM™ dates back to the year 2002 (Baraldi, 2011a).


Table 4. QIs of SIAM™ versus state-of-the-art RS-IUSs' (refer to Part I Section 2.8). Legend of fuzzy sets: Very low (VL), Low (L), Medium (M), High (H), Very High (VH). Legend of colors: Red-Bad, Blue-Average, Green-Good

Part I Section 2.2.2 reported the question: is human biology as irrelevant to AI research as bird biology is to aeronautical engineering? Actually, biological vision has always represented a fundamental source of inspiration for the CV community. While SIAM™ considers its degree of biological plausibility as a value added, straightforward imitation of biological vision solutions is not always possible. This is the reason why SIAM™ cannot be considered highly plausible in biological terms although it is very useful in practice. For example, SIAM™ cannot work with panchromatic imagery whereas the human visual system is perfectly able to interpret gray-tone images.

# **7. Conclusions**

It is well known that semantic information is not in objective sensory data, which is tantamount to saying there is a well-known information gap between semantic2 information and physical information. This conceptual work observes that semantic2 information is naturally (automatically, instantaneously) generated by the simultaneous interaction of a subjective external supervisor who observes and scrutinizes an objective sensory data set based on his/her own subjective prior knowledge base (ontology, model of the 3-D world). Semantic2 information resulting from this interaction takes the intermediate form of semisymbolic secondary data structures that incorporate physical information at the bottom level (layer 0) of an ontology represented as an inverted tree.

A shift of learning paradigm from MAL-from-examples to MAT-by-rules in the first stage of two-stage hybrid RS-IUSs is recommended. Experimental proof of this concept is provided by the operational automatic SIAM™ recently proposed in RS literature.

The practical conclusion of this conceptual work is twofold.

	- a. replaced by a deductive MAT-by-rules approach where community-agreed prior knowledge is conveyed and,
	- b. if useful, adapted to work on a driven-by-knowledge stratified (semantic masked) basis and moved to the second stage of a two-stage stratified hierarchical hybrid cognitive system. For example, a two-stage stratified hierarchical hybrid RS-IUS architecture has been proposed in recent literature, see Fig. 3 (Baraldi et al., 2006; Baraldi et al., 2010a; Baraldi et al., 2010b; Baraldi et al., 2010c; Baraldi, 2011a; Baraldi, 2011b).

This required shift of the learning paradigm from MAL-from-examples to MAT-by-rules adopted in the first stage of a two-stage hybrid RS-IUS is similar in nature to previous conceptual shifts occurring between deductive coarse-to-fine (from symbolic concepts to sub-symbolic data) AI/MAI and inductive fine-to-coarse (from sub-symbolic data to symbolic concepts) Cybernetics/MAL, see Part I Section 2.2. What is novel about the proposed shift of the learning paradigm from MAL-from-examples to MAT-by-rules at the first stage of a two-stage hybrid RS-IUS is the following.


subjective prior knowledge base (ontology or model of the 3-D world (Matsuyama & Shang-Shouq Hwang, 1990)) provided by an external subjective supervisor (human, God or equivalent machine), refer to Part II Section 4.


To summarize, to the best of this author's knowledge this is the first time a novel computational theory (RS-IUS architecture) is supported by operational (good-to-go, pressand-go, turnkey) algorithmic and implementation solutions as proofs of concept. For example, this was not the case of the Marr (Marr, 1982) or the Diamant CV systems (Diamant, 2005; Diamant, 2008; Diamant, 2010a; Diamant, 2010b), whose computational theories (see Part I Section 2.6) are both inconsistent with algorithmic solutions adopted by their authors. As a consequence, these two CV systems become two more instances of the well-known class of two-stage segment-based hybrid CV systems, also termed GEOBIA systems, traditionally affected by a lack of general consensus and research (Hay & Castilla, 2006; Matsuyama & Shang-Shouq Hwang, 1990).

The proposed conclusions of potential interest to the RS, CV, AI and MAL communities are supported by unquestionable independent sources of evidence listed below.

 Since the late 1950s, the original ambitious goals of AI/MAI and Cybernetics/MAL have been fragmented into "practical" and "manageable" problems equivalent to "a family of relatively disconnected efforts" (Diamant, 2005; Diamant, 2008; Diamant, 2010a; Diamant, 2010b).

	- Unlabeled (unsupervised) data learning algorithms, namely, unlabeled data clustering (Backer & Jain, 1981; Baraldi & Alpaydin, 2002a; Baraldi & Alpaydin, 2002b; Cherkassky & Mulier, 2006; Fritzke, 1997) and unlabeled (2-D) image segmentation algorithms (Burr & Morrone, 1992; Corcoran et al., 2010; Corcoran & Winstanley, 2007; Delves et al., 1992; Hay & Castilla, 2006; Matsuyama & Shang-Shouq Hwang, 1990; Petrou & Sevilla, 2006; Vecera & Farah, 1997), are recognized as inherently ill-posed problems subjective in nature by a relevant portion of existing literature.
	- Labeled (supervised) data learning classifiers are unable to establish correlation relationships between objective sensory (e.g., RS) data and categorical variables (e.g., land cover classes) at large data scale or fine semantic granularity. For example, in (Chengquan Huang et al., 2008) a forest/non-forest one-class SVM battery of classifiers must be re-trained and re-selected for every image in an image mosaic at global scale. Vice versa, labeled data learning classifiers are exclusively suitable for finding correlation relationships between objective sensory data and categorical variables at small data scale and coarse semantic granularity (e.g., in RS data mapping problems at coarse spatial resolution and local/regional scale). In fact, in practical RS data applications where supervised data learning algorithms are employed at large spatial scale, fine spatial resolution or fine semantic granularity (Chengquan Huang et al., 2008), the cost, timeliness, quality and availability of adequate reference (training/testing) datasets derived from field sites, existing maps and tabular data have turned out to be the most limiting factors on RS data product generation and validation (Gutman et al., 2004).

To the best of this author's knowledge, while the proposed practical conclusions of potential interest to the RS, CV, AI and MAL communities are supported by the aforementioned independent sources of evidence, these conclusions are not contradicted by any practical achievement gained by the RS, CV, AI and MAL communities in recent years. Thus, rather than being agreed or disagreed upon, these conclusions ought to be accepted by the scientific community unless proved otherwise when the increasing rate of collection of RS data of enhanced spatial, spectral and temporal quality will no longer outpace our capability of generating (rather than extracting) semantic2 information from RS data provided, *per se*, with no semantics at all.

# **8. Acknowledgments**

This material is partly based upon work supported by the National Aeronautics and Space Administration under Grant/Contract/Agreement No. NNX07AV19G issued through the Earth Science Division of the Science Mission Directorate. The research leading to these results has also received funding from the European Union Seventh Framework Programme

FP7/2007-2013 under grant agreement n° 263435. This author wishes to thank the Editorial Board of InTech for its competence and willingness to help.

#### **9. References**


remotely-sensed images, *IEEE Trans. Geosci. Remote Sensing*, accepted for publication, July 2011.



http://calvalportal.ceos.org/CalValPortal/showQA4EO.do?section=qa4eoIntro




**Earth Observation** Edited by Dr. Rustam Rustamov

ISBN 978-953-307-973-8 Hard cover, 254 pages **Publisher** InTech **Published online** 27, January, 2012 **Published in print edition** January, 2012

Today, space technology is used as an excellent instrument for Earth observation applications. Data is collected using satellites and other available platforms for remote sensing. Remote sensing data collection detects a wide range of electromagnetic energy which is emitting, transmitting, or reflecting from the Earth's surface. Appropriate detection systems are needed to implement further data processing. Space technology has been found to be a successful application for studying climate change, as current and past data can be dynamically compared. This book presents different aspects of climate change and discusses space technology applications.

#### **How to reference**

In order to correctly reference this scholarly work, feel free to copy and paste the following:

Andrea Baraldi (2012). Vision Goes Symbolic Without Loss of Information Within the Preattentive Vision Phase: The Need to Shift the Learning Paradigm from Machine-Learning (from Examples) to Machine-Teaching (by Rules) at the First Stage of a Two-Stage Hybrid Remote... Part II, Earth Observation, Dr. Rustam Rustamov (Ed.), ISBN: 978-953-307-973-8, InTech, Available from: http://www.intechopen.com/books/earthobservation/vision-goes-symbolic-without-loss-of-information-within-the-preattentive-vision-phase-part-ii

#### **InTech Europe**

University Campus STeP Ri Slavka Krautzeka 83/A 51000 Rijeka, Croatia Phone: +385 (51) 770 447 Fax: +385 (51) 686 166 www.intechopen.com

#### **InTech China**

Unit 405, Office Block, Hotel Equatorial Shanghai No.65, Yan An Road (West), Shanghai, 200040, China Phone: +86-21-62489820 Fax: +86-21-62489821

© 2012 The Author(s). Licensee IntechOpen. This is an open access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.